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Abstract. We consider optimal non-sequential designs for a large class of (lin- 
ear and nonlinear) regression models involving polynomials and rational functions 
with heteroscedastic noise also given by a polynomial or rational weight func- 
tion. The proposed method treats D-, E-, A-, and 4> p -optimal designs in a unified 
manner, and generates a polynomial whose zeros are the support points of the 
optimal approximate design, generalizing a number of previously known results of 
the same flavor. The method is based on a mathematical optimization model that 
can incorporate various criteria of optimality and can be solved efficiently by well 
established numerical optimization methods. In contrast to previous optimization- 
based methods proposed for similar design problems, it also has theoretical guar- 
antee of its algorithmic efficiency; in fact, the running times of all numerical 
examples considered in the paper are negligible. The stability of the method is 
demonstrated in an example involving high degree polynomials. After discussing 
linear models, applications for finding locally optimal designs for nonlinear regres- 
sion models involving rational functions arc presented, then extensions to robust 
regression designs, and trigonometric regression are shown. As a corollary, an 
upper bound on the size of the support set of the minimally-supported optimal 
designs is also found. The method is of considerable practical importance, with 
the potential for instance to impact design software development. Further study 
of the optimality conditions of the main optimization model might also yield new 
theoretical insights. 



1. Introduction 

This paper is concerned with optimal approximate designs for polynomial and 
rational regression models with heteroscedastic error modeled by a rational weight 
function. In our focus is the general linear model 

m 

y(i) = £>/<(*) + tel, (l) 

i=l 

where each fa is a known rational function defined on X, and the error e(t) is a 
normally distributed random variable with mean zero and variance o~ 2 (t) = l/u(t), 
where the known weight function u is a rational function whose numerator and 
denominator are both positive on X. We are interested in experiments designed to 
help estimate the unknown parameters 0j. The design space X is the finite union of 
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closed, bounded intervals in R, also allowing singletons as degenerate intervals. We 
assume that observations are uncorrelated, and that the /j are linearly independent. 

Our main result is a characterization of the support of the D-, E-, A-, and $ p - 
optimal designs as the optimal solutions of a semidefinite optimization problem. 
This directly translates to a method to numerically determine the optimal design, 
using readily available optimization software. The characterization is applicable to 
every linear model involving polynomials and rational functions with heteroscedastic 
noise also given by a polynomial or rational weight function. We demonstrate that 
the method is numerically robust (in the sense that it can handle ill-conditioned 
problems, such as those involving polynomials of high degree), and has very short 
running time on problems of practical size. 

Optimal designs for Fourier regression models and locally optimal designs for 
certain nonlinear models can also be found with similar methods. 

In many cases the experimenter is interested only in certain linear combinations 
of the parameter vector 9 := (9i, . . . , m ) T , which are given by the components of 
K T 6 for some m x s matrix K. In the presentation of our approach it is convenient 
to assume that our goal is to estimate the entire parameter vector, that is K = I m 
(the m x m identity matrix), and that the design space contains enough points to 
make all parameters estimable. (If K = I m , the latter assumption means that there 
is a design whose information matrix is non-singular, see later.) In Section 6 we 
show how the proposed method can be generalized to handle problems with general 
K. 

Much attention has been devoted to optimal designs for special cases of model (1). 
It is well known that when the design space X is finite, the D-, E-, and A-optimal 
approximate designs can be found by convex optimization even for arbitrary /j's, 
see, for example [4, Chapter 7] , or a generalization of this approach to multi-response 
experiments in [2]. However, when X is an interval, considerable difficulties arise, as 
the finite support of the optimal design also has to be characterized. 

A popular approach in the literature is that a polynomial is sought whose roots are 
the support points of the optimal design. For instance, as discovered by Guest [18] 
and Hoel [21], the D-optimal design for ordinary polynomial regression, when fa = t l , 
and w is a positive constant, on X = [—1, 1] is the one that assigns uniform weights 
to each of the zeros of t — > (1 — t 2 )-^L m (t), where L m is the Legendre polynomial of 
degree m. The number of support points had already been determined in [7]. Similar 
characterizations are known for A- and E-optimal designs for polynomial regression, 
see, for example the classic monographs [14, 33]. Another common approach is to 
determine the canonical moments of the optimal design [11, 12]. Further optimality 
criteria for polynomial models, and closed-form characterizations of the optimal 
designs for linear and quadratic models, are discussed in [36]. See also [24] for E- 
optimal designs for linear models with rational functions fi(t) = (t — oti)~ l with 
oii X. The Optimum Experimental Design website [1] also contains a rather 
comprehensive list of solved models, along with an impressive, and continuously 
maintained, list of references. 
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More recently considerable attention has been paid to polynomial models with 
missing interactions, also called incomplete or improper polynomial models. Repre- 
sentative results include [8], which gives D-optimal designs when only odd or only 
even degree terms appear in the model; [22] and [5], which consider D- and E- 
optimal designs (respectively) for polynomial models with zero constant term; [11], 
which considers D-optimal designs, also for some multivariate problems, over the 
unit cube under less restrictive assumptions on the missing terms; and [13], which 
gives D-optimal designs when only the lowest degree terms, up to a fixed degree 
m', are absent. Note that even the union of these methods does not yield a com- 
plete solution to incomplete polynomial models, even for univariate regression with 
homoscedastic error. 

Results in the heteroscedastic case are even more scarce and typically less general. 
For instance, [23] is devoted to D-optimal designs for polynomial regression over [0, b] 
with the weight function u(t) = t/(l + t). 

The design space X is almost always a (closed, bounded) interval, which is prob- 
ably sufficient for most applications. Imhof and Studden [24] also considered some 
rational models when X is the union of two disjoint intervals. 

Most of the above results are based on the theory of orthogonal polynomials, 
canonical moments [12], and Chebyshev systems [25]. They are rather specific in 
their scope, and generalization of their proofs appears to be difficult. On the other 
hand, most of them yield numerically very efficient methods for computing nu- 
merically optimal designs. The bottleneck in these methods is either polynomial 
root-finding, which can be carried out in nearly linear time in the degree of the 
polynomial [32], or the reconstruction of a measure on finite support from its canon- 
ical moments, which can also be carried out relatively easily [12]. An exception 
is the method of [13], which involves finding the global maximum of a multivariate 
polynomial (even though it is concerned with univariate polynomial regression only). 
This is an NP-hard problem even in very restricted classes of polynomials, and is 
known to be very difficult to solve in practice even when the number of variables 
and the degree are rather small [20]. 

In the pursuit of more widely applicable methods, some of the attention has 
turned to the numerical solution of optimization models that characterize optimal 
designs. Pukelsheim's monograph [33] is a comprehensive overview of optimal design 
problems with an optimization-oriented viewpoint, but it is not concerned with 
algorithms or numerical computations. Most numerical methods proposed in the 
literature are variants of the popular coordinate- exchange method from [29], which 
is a variant of the classic Gauss-Seidel method (also known as coordinate descent 
method) used in derivative-free optimization. These algorithms maintain a finite 
working set of support points, and iteratively replace one of the support points by 
another one from X if the optimal design on the new support set is better than 
that of the current support set. See [6] for a recent variant of this idea for finding 
approximate D-optimal designs. 
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However, this approach has serious drawbacks, and care has to be taken not to 
abuse them: (i) some variants require that the size of the minimally supported op- 
timal design be known a priori; (ii) no bound is known on the number of iterations 
the algorithm might take; (iii) in fact, the number of iterations of the coordinate de- 
scent method is known to be quite high in practice even for some very simple convex 
optimization problems [31, Chapter 9]; and (iv) the coordinate descent method does 
not necessarily converge at all if the function being optimized is not continuously 
different iable [35]. Hence, these methods can hardly be considered a completely 
satisfactory solution of most polynomial regression problems, even though some 
successful numerical experiments have been reported, cf. [6]. 

This paper proposes a different approach to linear regression models involving 
polynomials and rational functions. Motivated in part by the approach of [4], it is 
also based on an optimization model involving linear matrix inequalities, which can 
be solved efficiently, both in theory and in practice, by readily available optimization 
software. 

The novelty of the proposed method is that it does not work with the support 
points directly, as existing numerical methods, such as the coordinate-exchange 
method, do. Instead, it follows some of the previous symbolic approaches by com- 
puting the coefficients of a polynomial whose zeros are the support points of the 
optimal design. 

After introducing the problem formally, we derive our main theorems in Section 3 
for the estimation of the full parameter vector 9. Illustrative examples are presented 
in Section 4. Section 6 is concerned with the more general case, when only a subset of 
the parameters (or their linear combinations) need to be estimated. We then apply 
these results to finding locally optimal designs for nonlinear models in Section 7. 
Finally, in Section 8 we give an outlook to models of regression involving other 
functions than rational functions. 

Notation. We will make use of the following, mostly standard, notations: degp de- 
notes the degree of the polynomial p, 1cm stands for the least common multiple 
of polynomials. The denominator of a rational function r is denoted by den(r). 
The positive part function is denoted by (•)+■ The brackets (•, •) denote the usual 
(Frobenius) inner product of vectors and matrices, that is, (A, B) = Ylij ^ij-^ij- 
Since many decision variables in the paper are matrices, linear constraints on ma- 
trices are written in operator form. For example, a linear equality constraint on an 
unknown matrix X will be written as A(X) = b (where A is a linear operator and b 
is a vector) to avoid the cumbersome "vec" notation necessary to use matrix- vector 
products. For the linear operator A, A* denotes its adjoint. The identity operator 
is written as id. 

The space of m x m symmetric matrices is denoted by § m , the cone of m x m 
positive semidefinite real symmetric matrices is §+. The Lowner partial order on 
S m , denoted by !>=, is the conic order generated by §+; in other words, we write 
A B when A-SeS™. 
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2. OPTIMALITY CRITERIA AND THEIR SEMIDEFINITE REPRESENTATIONS 

A design for infinite sample size (also called approximate design or design for 
short) is a finitely supported probability measure £ on I. Using the notation f(t) = 
(fi(t), . . ., f m (t)) T , the Fisher information matrix of 9 corresponding to the design 

e is 

M(0 = Jj(t)f(tfu;(t)dm- (2) 

Of course, this integral simplifies to a finite sum for every design. Note that for every 
£, M(£) G Sip. A design £ is considered optimal if M(£) is maximal with respect to 
the Lowner partial order (recall the end of the previous section); see [33, Chapter 
4] for detailed statistical interpretation. If $ is an Sip — y M. function, the design £ 

is called optimal with respect to $, or ^-optimal for short, if $(M(£)) is maximum. 
Again, only those criteria are interesting which are compatible with the Lowner 
partial order, that is functions $ satisfying > <&(B) whenever A )p B )p 0. 

Popular choices of $ include the following. 

(1) When $(M) = det(M), | is called D-optimal. 

(2) When $(M) = X\(M), the smallest eigenvalue of M, £ is called E-optimal. 

(3) When $(M) = — tr(M _1 ), where tr denotes matrix trace, £ is called ^4- 

(4) When $(M) = (tr(M p )) 1 / p , | is called %-optimal. 

For most purposes of the paper $ could be an arbitrary concave extended real 
valued function on Sip with finite values on the interior of Sip. However, to avoid 
certain technical difficulties, and in order to obtain good characterizations of optimal 
designs, we will assume that the $ of our choice is representable by linear matrix 
inequalities (LMIs) or semidefinite representable, this includes all of the criteria 
discussed above. The precise definitions we need are summarized next. 

Definition 1. A set S C W 1 is semidefinite representable if for some k > 1 and 
/ > there exist affine functions A : R n — y S k and C : R l — > S k such that the set S 
can be characterized by a linear matrix inequality in the following way: 

5 = {sGl n |3«Gt ; : A{s) + C{u) )? 0}. 

Note that the intersection of semidefinite representable sets is also semidefinite 
representable, so we could equivalently allow to have a characterization of the above 
form with p > 1 inequalities. The motivation behind the idea of semidefinite repre- 
sentable sets is that finding global optima of "nice" functions over them is easy, and 
a number of numerical methods are available to that in an efficient manner. "Nice" 
functions include semidefinite representable functions, defined below, in Definition 3. 

In this paper we will encounter two important instances of semidefinite repre- 
sentable sets: the coefficient vectors of polynomials that are nonnegative over an 
interval, and the level sets of the optimality criteria $. 
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Lemma 2 ([25, Chapter 2]). The set 

d 

P^ b] = {(p ,...,Pd): J>^>0 Wxe[a,b}} 

i=0 

of coefficient vectors of polynomials of degree d that are are nonnegative over the 
interval [a,b] is a semidefinite representable subset o/R d+1 . 

This is a reasonably well known theorem in probability and statistics owing to its 
application in moment problems [12], for completeness we provide a specific repre- 
sentation in the Appendix. The same assertion holds even if the polynomials are 
represented in another basis, not in the monomial basis, but the actual characteri- 
zation will, of course, be different. 

The next definition is necessary to define the class of optimality criteria our ap- 
proach can handle. 

Definition 3. A function $ : — > R is semidefinite representable if its (closed) 
upper level sets are semidefinite representable, that is, if for some ki, . . . ,k p and / 
there exist linear functions A{ : — > S fci , Ci : M 1 — > S fc % and matrices Bi G E> ki , 
Di G§ fc ' (i = 1, . . . ,p) such that for all X G S+ and z G R, $(X) > z holds if and 
only if 

Ai{X) + B t z + Ci{u) + A ^ z = l,...,p (3) 

for some u G R . 

As mentioned above, finding the optimal value (and the optimizer) of a semidef- 
inite representable function over a semidefinite representable set is generally easy; 
optimization problems of this form are called semidefinite optimization problems or 
semidefinite programs; see also the beginning of the next section. 

We will also need the following (technical) assumption on the relationship between 
the model (as defined by the functions /j and u) and the criterion function $. It is 
only used in the proof of the main theorem. 

Definition 4. We say that the semidefinite representable function $ : S!^ — > R is 
admissible with respect to the set X C if $ has a representation (3) for which 

there exists an X G X satisfying (3) with strict inequality for some z and u. That 
is to say that the left-hand side of each of the p inequalities can be made positive 
definite simultaneously for at least one X G X. 

This is a rather technical condition in the sense that most interesting functions 
$ are admissible with respect to every non-empty set X (a sufficient condition for 
this is that in the semidefinite representation of $ each Bi be positive or negative 
definite), or at least with respect to every X that contains a non-singular matrix. 

D-, E-, and A-optimality are all semidefinite representable, or are equivalent to 
other criteria given by semidefinite representable functions. The same holds for $ p - 
optimality. They are also admissible with respect to every set of Fisher information 
matrices for which the criteria is well-defined (see below). Note that all semidefinite 
representable functions are quasi-concave, continuous functions. 
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Example 5 (E-optimality). For every M G § m , Xi(M) > z if and only if M-zI fc= 
0, so Ai admits a simple semidefinite representation. In this representation p = 1, 
Ai = id, B\ = —I, C\ = 0, and D\ = 0, hence Ai is admissible with respect to every 
non-empty set of Fisher information matrices. 

Example 6 (A-optimality). It follows from Haynsworth's theorem [19] on the 

inertia of Hermitian block matrices that a symmetric block matrix ^ ^ T with 

positive definite block P is positive semidefinite if and only if its Schur complement, 
given by R — Q T P" 1 Q, is positive semidefinite. Let M G be invertible, for 
example an invertible Fisher-information matrix, and fix a k G {1, . . . , m}. Plugging 
in M for P, the kth unit vector for Q T , and a scalar w for R we have that 

(M -1 )^ < u if and only if f ^ e M fc j 0. This observation yields a semidefinite 

representation of A-optimality of the form (3) with p = m + 1: 

tr(M -1 ) < z iff 3 u±, . . . , u m : z > Ui, and ( ^ ^ J ^ 0, = 1, . . . , m. 

i 

It follows that the A-optimality criterion is admissible with respect to every set 
of Fisher information matrices that contains at least one non-singular matrix. 

Example 7 (D- and $ p -optimality). The cases of D-optimality and $ p -optimality 
are more complicated, but can also be fitted in the above framework. Owing to page 
limitations we can only give the flavor of this result, and pointers to the literature. 

D-optimality is equivalent to optimality with respect to the criterion $(M) = 
(det(M)) 1 / m , where m is the size of M. Note that this is the geometric mean of 
the eigenvalues of M. <3> p -optimality is expressed by the matrix mean $ p (M) = 

(tr(M p ))^ p = (Y^iLi K) 1 ^ i where Aj is the ith eigenvalue of M. Hence, both 
criteria are symmetric functions of the eigenvalues of M. Moreover, both the geo- 
metric mean and the p-norm, for every rational p > 1 are also semidefinite rep- 
resentable [3, Section 3.3.1]. Finally, we can invoke [3, Proposition 4.2.1], which 
states that for every semidefinite representable symmetric g : R m — > K., the function 
$(M) = g(Ai(M), . . . , A m (M)) is also semidefinite representable. 

D- and $ p -optimality are also admissible with respect to every set of Fisher in- 
formation matrices that contains at least one non-singular matrix. 

Another interesting optimality criterion, not considered in this paper, is the max- 
imin efficient criterion. Models for which maximin efficient approximately optimal 
designs can be found using semidefinite programming (this includes polynomial mod- 
els) can be found in the recent technical report [15]. 

3. Optimal designs and semidefinite optimization 

First we shall give a very short introduction to semidefinite optimization to sum- 
marize the background necessary to keep this paper self-contained. The reader is 
also encouraged to consult [39]; or [40] for a considerably more in-depth survey to 
this vast field. 
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Semidefinite optimization (or semidefinite programming) is a generalization of 
the familiar linear optimization. A semidefinite program (or SDP for short) is the 
mathematical problem of finding the optimum of a linear function subject to the 
constraint that an affine combination of matrices is positive semidefinite. In other 
words, it is an optimization problem of the form 



where c G M n and A\ G § m , {% — 0, . . . , n) are given; Xi denotes the ith component 
of the vector x; these are the variables. 

Constraints of the above form are called semidefinite constraints or linear matrix 
inequalities. The format of problem (4) is regarded as a "standard form", but other, 
seemingly more general optimization problems that can be converted to the above 
form are also considered semidefinite programming problems. In particular, multiple 
semidefinite constraints can be added to the problem, and the constraints can be 
augmented by linear inequalities and equations, as these translate to constraints 
on diagonal matrices. Matrices of variables can also be considered, and constrained 
simultaneously in the form L(X) )p C\ here X is the matrix of variables, L is a linear 
operator, and C is a matrix of appropriate size. More generally, the maximization 
of every semidefinite representable function over every semidefinite representable set 
(as defined in the previous section) can be cast as an SDP. In this paper we will 
show that finding the support of the optimal design can be cast as an SDP of this 
more general form, for every regression model (1). 

Semidefinite programs are special convex optimization problems, and the stan- 
dard duality theory of convex optimization [34, 35] applies to them. Algorithms to 
numerically compute the optimal solutions of a semidefinite program have been well 
studied for more than two decades; SDPs involving tens of thousands of variables 
are routinely solved in the literature [39]. The SDPs of this paper are considerably 
smaller; they can be solved in a fraction of a second without any numerical issues by 
commonly used SDP solver software, such as SeDuMi [37], a freely available Matlab 
toolbox. Additional toolboxes, such as CVX [17] and YALMIP [27], are available 
to translate "high-level" semidefinite programs involving semidefinite functions such 
as the optimality criteria mentioned in this paper to the semidefinite programs in 
the above "standard" form. 

3.1. Semidefinite representation of optimal designs. The main result in this 
section, and of the paper, is that the problem of finding an optimal design with 
respect to $ can be equivalently written as a semidefinite programming problem 
whenever the functions fi, i = 1, . . . , m and u are rational functions defined over a 
finite union X of closed intervals, and $ is a semidefinite representable function that 




n 



(4) 



subject to 




i=l 
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satisfies the mild technical condition that it is admissible with respect to the set of 
all Fisher information matrices. 

As mentioned in the Introduction, this is already known for design spaces X 
consisting of finitely many points, even for arbitrary {fi}. While it is not stated 
there in this general form, the following theorem is implicit in [4, Chapter 7]: 

Theorem 8 ([4, Chapter 7]). Let X C R be finite, and $ be a semidefinite rep- 
resentable function compatible with the Lowner partial order. Then the ^-optimal 
designs for model (1) are characterized as the set of optimal solutions to a semidef- 
inite programming problem. 

In this semidefinite programming problem the support points are fixed param- 
eters, and the variables are the masses the optimal design assigns to the support 
points; hence Theorem 8 allows us to find the optimal design only once its support 
is known. Treating the support points as variables would be problematic for two 
reasons: the number of support points for the optimal design may not be known, 
and even if it was, the resulting optimization problem would be intractable. Our 
goal in this paper is to characterize the support of the optimal design as a solution 
of a semidefinite program. In the optimization problem we are about to define, the 
variables are the coefficients of a polynomial whose roots are the support points of 
the optimal design. 

Our main result, Theorem 9 below, is the characterization of the support of the 
optimal design as a solution of a semidefinite program. After finding the support, 
Theorem 8 can be applied to find the weights — by solving another semidefinite pro- 
gram. 

Theorem 9. Suppose that in the linear model (1) X is a finite union of closed inter- 
vals, the functions fi are rational functions with finite values on X, and u is a non- 
negative rational function on X. Let $ be an admissible semidefinite representable 
function (with representation (3)) with respect to the set of Fisher information ma- 
trices M = conv {f(t)f(t) T u)(t) 1 1 G X}. Then the support of the ^-optimal design 
is a subset of the real zeros of the polynomial ir obtained by solving the following 
semidefinite programming problem: 

minimize y (5a) 

Wi,...,W p e§\ 

p p 

subjectto ^,^ = -1, J]C*(^) = 0, (5b) 

i=l i=l 

K = U{y,W 1 ,...,W p ), (5c) 

TT G Pi (5d) 

where d is the degree of the polynomial 

t lcm(den(u;), den(/ 1 2 ), . . . , den(/ p 2 )) fy - ^(W i5 ^(M(£ t )) + A)) , (6) 
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whose coefficient vector is denoted by H(y, Wi, . . . , W p ) in (5c) above. 

Note that the operator II in (6) is affine, hence aside from (5d) every constraint 
in (5) is a linear equation or linear matrix inequality. Furthermore, (5d) can be 
translated to linear matrix inequalities using Lemma 2. Hence, (5) is indeed a 
semidefinite program. 

Not wanting to defer the discussion of examples and extensions, the proof was 
moved to the Appendix. Instead, we discuss a few examples. 



We start with two detailed examples demonstrating how E- and A-optimal design 
problems translate to semidefinite optimization models. Then the numerical robust- 
ness of the proposed method is investigated using a high degree polynomial model. 
Finally, an example with rational models is shown, in which the parameters of point 
sources emitting radiation are estimated from measurements of total intensity. 

All timing results in this paper were obtained using the semidefinite solver SeDuMi 
[37] running on an ordinary desktop computer with a 2.83GHz processor, using a 
single core. 

Example 10 (E-optimal designs without an intercept). This problem was 
considered in [5], and we use it here to illustrate the steps of the approach and to 
verify the correctness of our model in a relatively high degree model that has been 
solved: X = [—1,1], /j = t\ i = l,...,m, and a; is a positive constant. Using 
the semidefinite representation of E-optimality given in Example 5, the variables in 
the optimization model of Theorem 9 are the scalar y and the positive semidefinite 
matrix W\ of order m. The constraints can be derived as follows: from Example 5 
we have A x = id, B 1 = —I, C\ = 0, and D x = 0, hence (Bi,Wi) = — tr(Wi), and 
(d,Wi) = 0. Also note that 



where the last matrix has t l+ i as its (i, j)-th entry. 

Hence, the first constraint of (5b) is tr(H / i ) = 1, whereas the second constraint 
of (5b) is simply = 0, and can be omitted. We have deg(7r) = 2m, and the 
correspondence between the entries of W± and the coefficients of ir(t) = Y^qP^i 
given in (6), simplifies to the system of equations 



4. Examples 





i+j=k 
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cvx_begin 
m = 8; 
variable y; 

variable W(m,m) symmetric; 
variable pi(2*m+l); 

minimize y; 
subject to 

W == semidef inite (m) ; 

trace (W) == 1; 

pi(l) == y; 

pi(2) == 0; 

-pi(3) == W(l,l) ; 

-pi (17) == WC8.8) ; 

pi (end: -1 : 1) == nonneg_poly_coef f s (2*m, [-1,1]); 

cvx_end 

Figure 1. Matlab solution for Example 10 using the CVX toolbox. 
Note that Matlab indexes vectors starting from 1 instead of 0. The 
equations defining the coefficients 7r 4 through 7r 16 have been omitted 
for brevity. 

In summary, dropping the subscript from W\, we have the optimization problem 
minimize y 

subject to tr(W) = 1, 

7r = (y,0,S 2 ,...,S 2m )eP^ 1 \ 

where Sk = — Yli+j=k Wij {k = 2, . . . , 2m) are the anti-diagonal sums of the matrix 
W, and the constraint ir = (y, 0, S 2 , ■ ■ ■ , S2 m ) £ P^ 1,1 ! can be turned into the 
system of linear and semidefinite constraints (17) given in the Appendix, plugging 
in a = — 1,6=1. 

For practical computations several Matlab toolboxes, such as CVX [17] and 
YALMIP [27], are available to facilitate the translation of semidefinite programs 
such as the one above to the the so-called "standard form" required by semidefinite 
solvers. Rather than providing a detailed description or comparison of these pro- 
grams, we offer a completely self-explanatory example, the formulation of the above 
problem in the language of the CVX toolbox, in Figure 1. Note that both the trace 
constraint and the nonnegative polynomial constraint are represented at the same 
high level in the code as in the mathematical model above. They are translated to 
a standard form semidefinite program and solved using a semidefinite programming 
solver automatically by CVX, leaving virtually no work to the user. 
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For example, solving the resulting problem for m = 8, the optimal vector tt is the 
coefficient vector of a degree 16 polynomial whose real roots are: {±1, ±0.9207, ±0.693, ±0.3357}. 
It also has two imaginary roots. The eight real roots constitute the support of the 
E-optimal design. The same numerical example was considered in [5] with, of course, 
the same conclusion. The running time of SeDuMi in this example was 0.2 seconds. 
Example 11 (A heteroscedastic polynomial model). Consider the cubic model 
fi = f _1 , i — 1, . . . , 4, with heteroscedastic noise given by uj(t) = 1/(1 +t 2 ), over the 
design space [—5, 5]. We chose this arbitrary model because it is one of the simplest 
among those whose solution appears to not to be characterized. 

The A-optimal design is computed as follows. The parameters Ai,Bi,C{, Di in the 
semidefinite representation (3) of A-optimality are determined first from Example 6. 
Using this representation, the constraints of the semidefinite programming problem 
in Theorem 9 are compiled in the following way. 

• There are 4 semidefinite matrices Wx, • • • , W4 of order 5, and W5 is a non- 
negative scalar. 

• The first constraint of (5b) is simply W$ = 1. The second constraint of (5b) 
translates to (1^)5,5 = 1 for each i — 1, ... ,4. 

• We have deg(7r) = 6, and comparing the coefficients on the two sides of (6), 
we obtain a linear system of equations and matrix inequalities for (5c) and 
(5d), along the same lines as in the previous example. 

The optimal solution is a polynomial whose real roots are {±5, ±0.854}, this is the 
support of the A-optimal design. 

In the remaining examples we shall refrain from the detailed list of the above steps, 
and concentrate on the main features of the models and the numerical results. 

Example 12 (Polynomial models of high degree). We now consider the prob- 
lem of designing experiments for very high degree polynomial models in order to 
test the numerical stability and scalability of our approach. Models involving high 
degree polynomials are rarely justifiable, but they are good problems to test nu- 
merical stability, as they are notoriously ill-conditioned. For example, in the basic 
polynomial model, when /, = t % ~ x ,i = l,...,n, the the Fisher information matrix 
M(£) in (2) becomes a Hankel matrix, which is known to be ill-conditioned [38]. 
Also note that in the case of rational models, the polynomial defined by (6) might 
also have a degree that is considerably higher than the degree of the numerators 
and denominators of the functions fi, leading to potentially ill-conditioned opti- 
mization models. The numerical difficulties can be somewhat alleviated by using 
an orthogonal polynomial basis in (1). In this example we look for the E-optimal 
polynomial design in the ordinary polynomial regression model, but using the Le- 
gendre polynomial basis: = Pi-x, the (i — l)-st Legendre polynomial defined by 
P = l,Pi(t) = t, and P n+1 (t) = (n + l)P n +x{t) + (2n + l)tP n (t) - nP„_i(t) for 
n > 1. 
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The constraints are obtained along the same lines as in Example 10, except that 
the coefficients of 7r in (6) need to be changed as the moment matrix M(£t) changes 
with the change of basis. 

We solved the resulting semidefinite program for the degree 20 model; the compu- 
tation required 0.4 seconds. The optimal polynomial ir is a nonnegative polynomial 
on [—1, 1] with single roots at the endpoints ±1, and double real roots at the points 

{±0.981, ±0.937, ±0.872, ±0.788, ±0.686, ±0.568, ±0.438, ±0.297, ±0.150, 0}. 

The E-optimal design is supported on these 21 points. 

We remark that the use of high degree polynomials can also be circumvented using 
polynomial splines, which allow for the same large number of parameters without 
numerical difficulties; this will the subject of a forthcoming paper. 

Finally, we present an example using rational functions. 

Example 13 (Measuring radiation parameters). Consider the measurement 
of total radiation emitted from point sources, whose intensity obeys the inverse 
square law: h{r) = 9ir~ 2 where li is the intensity of the radiation emitted by source 
% measured at distance r from the source, for some unknown parameter 9{. The 
locations x\ of the sources are known. The response variable in our model (1) is 
the total radiation. To be estimated are the values 8i, affected by parameters of 
sources, shielding between the sources and detector, and several other factors. In 
this numerical example we consider a simple one-dimensional instance: the locations 
of the three sources are x\ = —2, x 2 = 2, x 3 = 4, and we are interested in the 
effective values of as measurable in the interval [—1,1], where the variance of the 
measurement error and the parameters are assumed to be constant. 

The distance of a detector at t from the ith point source is = \t — Xi\, so in our 
model (1) we have fa = r^ 2 = (t — Xi)~ 2 ,i = 1,2,3, and X = [—1, 1]. The solution 
of the semidefinite program, which took 0.2 seconds, yielded a three-point support 
for the E-optimal design: { — 1, 0.231, 1}. 

5. Reconstructing the optimal design 

Once we obtained a non-zero polynomial tt from the optimal solution of (19), we 
can find the optimal design by solving a second semidefinite programming problem, 
using Theorem 8. But Theorem 9 is only useful if the polynomial n in the optimal 
solution is not the zero polynomial. As the following example shows, in sufficiently 
degenerate cases it might be. 

Example 14. Consider the E-optimal design problem when m — 2, f(t) = (l,t) T , 
u = 1, and X = [—1,1]. Then the corresponding semidefinite programming problem 
simplifies to 

min y s.t. W ^ 0, tr(W) = 1, tt = (y - W u , -2W 12 , -W 22 ) G P M ' 1] , 

y,W 

by essentially the same calculations as in Example 10. It is not hard to see that the 
set of optimal solutions to this problem is {(y, W) \ y = 1, W\ 2 = 0, Wu + W 22 = 



14 



dAvid papp 



1,0 < Wu < 1}. Hence, we have infinitely many solutions, including Wu = 1 — 
W22 = 1, which corresponds to ir(t) = 0. Choosing any other optimal solution yields 
a polynomial whose roots are the expected t = ±1. 

Alternatively, we can change / to a different basis of degree one polynomials. 
This does not really change the model, however, if we choose, for example, f(t) = 
(a,t) T for any a > 1, the above problem disappears: the resulting semidefinite 
programming problem has a unique optimal solution, and that solution corresponds 
to a nonzero polynomial n, with two real roots. 

In the rest of the section we list a number of sufficient conditions that ensure that 
the optimal it in (5) is not the zero polynomial. The first one is perhaps the most 
obvious one. 

Lemma 15. Let fi, . . . , f m and cu in (1) be chosen such that 1 ^ spanjw/j/j | 1 < 
i < j < m}. Then no solution satisfying the constraints of (5) has n = 0. 

Special cases covered by this lemma include designs for incomplete polynomial 
models with no intercept, such as those considered in [22] and [5], and models 
involving rational functions, such as Example 13 above. 

The last observation of Example 14 also generalizes to E-optimal designs for ar- 
bitrary polynomial systems. 

Lemma 16. Consider the E-optimal design problem for a polynomial model with at 
least two parameters to be estimated. By choosing an appropriate basis {fi, . . . , f m } 
in (1) it can be guaranteed that no optimal solution of (5) has ir = 0. 

Proof. Let (y, W, fr) be an optimal solution to (5). Then W = Y T Y for some matrix 
Y, and the polynomial q: t — > (W,M(£ t )) can be written as q{t) = z(t) T z(t) with 
z(t) = Y f{t). Consequently, q can only be a constant (and ft can only be the zero 
polynomial) if z(t) = Yf(t) is componentwise constant. 

If 1 ^ span{/!, . . . , f m }, then this is impossible, because Y = is excluded by the 
constraints (5b), which simplifies to tr(jy) = 1 for E-optimal designs. 

If 1 G spanj/x, . . . , f m }, then we can assume without loss of generality that f\ = 1. 
Now q can be a constant only if Wu = 1, and every other entry of W is zero, making 
q(t) = 1 and y — 1. Replacing fi, i > 2 by Xf\ with a sufficiently small positive A 
that satisfies X\fi(t)\ < 1 for all t e I ensures that this is not the optimal solution 
to (5). □ 

A similar argument applies to A-optimal designs for polynomial models. For 
brevity we omit the details. As above, one can argue that by scaling the non- 
constant basis functions, solutions to the semidefinite programming problem that 
yield constant zero n cannot be optimal. 

Lemma 17. Consider the A-optimal design problem for a polynomial model with at 
least two parameters to be estimated. By choosing an appropriate basis • • • , /m} 
in (1) it can be guaranteed that no optimal solution of (5) has ir = 0. 
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Finally, as a corollary to Theorem 9, we also obtain an upper bound on the size 
of the support set of the minimally-supported optimal designs. 

Corollary 18. Let and d u be the degree of the numerator and denominator of 
cu, Hi and be the degree of the numerator and denominator of f i; and d& eri = 
\cm(d UJ , d\, . . . , dp). Furthermore, suppose that X is the union of ki + k 2 disjoint 
closed intervals, k\ of which are singletons. (The remaining k 2 intervals have distinct 
endpoints.) Then for every admissible criterion <3> for which the optimal solution to 
(5) does not have tt = there is a ^-optimal design supported on not more than 
min(|(/c! + 2k 2 + deg tt), deg tt) points, where deg n = <i den + (n u — d u + 2 maxj(rij — 
di))+- 

Proof. We need to count the number of distinct zeros of the polynomial tt in (5). 
On one hand, tt cannot have more than deg7r roots. On the other hand, since tt 
is nonnegative over X, each of its zeros must be either an endpoint of an interval 
constituting X or a root of multiplicity at least two. Hence the number of distinct 
zeros of tt is at most \{k\ + 2k 2 + deg7r). Finally, the expression for deg7r comes 
directly from (6). □ 

6. Parameter subsystems, estimability 

Often the experimenter is not interested in the entire parameter vector 9, but 
rather in a subset of them, or more generally in s < m specific linear combinations 
of the parameters: kj9, j = l,...,s. Let K be the matrix whose columns are 
kx,...,k s ; so far we have assumed s = m and K = I. An application of this 
more general setting is polynomial regression, when the experimenter needs to test 
whether the highest degree terms in the model are indeed non-zero. 

It can assumed without loss of generality that K has full (column) rank, and to 
make the problem meaningful, it must be assumed that the parameters K T 9 are 
estimable, that is, 

range(il~) C range(M), (7) 

see for example [33, Chapter 3]. In this setting the matrix M is replaced by the 
information matrix (K T M Ji K)~ 1 , where M' denotes the Moore-Penrose pseudo- 
inverse of M. In particular, the optimal design is a probability measure £ that 
maximizes the matrix (K T (£) K)' 1 , or the function £ ->■ &((K T (^K)^) for 
some criterion function $ compatible with the Lowner partial order. 

The optimization models for this setting can be developed analogously to the 
model of the previous section. Since $ is assumed to be compatible with the Lowner 
partial order, maxMe^ $((il" T M t (£)il")- 1 ) is equivalent to 

max{$(F) \MeM, {K t M ] K)- 1 fc= Y ^ 0}. (8) 

Note that the optimum does not change if we require Y to be positive definite, in 
which case the last two inequalities are equivalent to Y^ 1 )p K T M^K. We shall 
use now a Schur complement characterization of semidefinite matrices, which is a 
generalization of the result used in Example 6. 
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Proposition 19 ([41, Theorem 1.20]). The symmetric block matrix (^-t ^) is pos- 
itive semidefinite if and only if M )>= 0, Z )>= K T M^K, and range(-fT) C range(M). 

By this proposition, (8) is equivalent to 

max{$(F) | M G af , Y *,) ^ 0}. 

Using Schur complements again, the inversion from the last inequality can be elim- 
inated, and we obtain the following equivalent optimization problem: 

max{$(Y) | M G M , M )p KYK T , Y ^ 0}. (9) 

Finally, we can simplify this problem essentially identically to how we obtained 
(5) from (18). Doing so we obtain the following. 

Theorem 20. Consider the linear model (1) and a matrix K G IR mxs satisfying 
ik(K) = s and the estimability condition (7). Then for every semidefinite repre- 
sentable criterion function $ a polynomial n whose real zeros contain the support 
of a $- optimal design for the parameter vector K T 6 is an optimal solution of the 
following semidefinite program: 

minimize y 

w x ,...,w p e$ k + 



subject to K T VK ^ A *(Wi), 



i=l 



j]<Wi ) B i > = -i, £cj(Wi) = o, 



i=l 



n = U(y,V,W 1 ,...,W p )eP x , 
where A iy B iy C{ and Di come from Definition 3, and d is the degree of the polynomial 

t lcm(den(o;), den(/ 1 2 ), . . . , den(/ 2 )) (y - (V, M(&)> - ^(W,, A)) , 



8=1 



whose coefficient vector is denoted by U(y, V, W±, . . . , W p ). 

We omit the rest of the proof as it is essentially identical to that of Theorem 9, 
given in the Appendix. The main difference is the appearance of the variable V, 
which is the dual variable of the constraint M )p KYK T . 

7. Locally optimal designs for nonlinear models 

In this section we show how to apply Theorems 9 and 20 to find locally optimal 
designs (with respect to various optimality criteria) for nonlinear rational models 
(see the definition and its motivation below). We consider the general nonlinear 
model 

y(t) = f(t;9) + N(0,a(t)), tel, (10) 
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where / is a rational function of (t;9), 9 is an m- vector of unknown parameters. 
The designs space X is the union of finitely many closed intervals, as before. 

Nonlinear regression models are widely used and researched, but finding optimal 
designs for nonlinear regression is particularly challenging - so much so that even 
numerical solutions to simple two- and three-variable models are highly non-trivial 
to obtain. (See for example [9] for recent results on a number of models used in 
dose-finding studies, and [26] for pharmacokinetic models.) Nonlinear rational mod- 
els (where the response variable is a rational function of the explanatory variable 
and the unknown parameters) and models involving exponential functions and log- 
arithms are particularly well studied. Imhof and Studden [24] considered E-optimal 
designs for different classes of rational models. More recently, Dette et al. [10] 
investigated E-optimal designs for a more general family of functions (not only ra- 
tional functions), under the assumption that some partial derivatives of the model 
function form a weak Chebyshev system [25]. Note that this class of problems is 
not comparable to the rational models we are considering: the partial derivatives 
of many non-rational functions satisfy this criterion, but many rational models, for 
instance, the E max model from Example 21 below, are outside that class. 

Perhaps the most fundamental complication in designing non-sequential experi- 
ments for nonlinear models is in the formulation of the problem as a meaningful 
optimization problem. For a nonlinear regression model (10) the Fisher information 
matrix corresponding to the design £ is 



It is immediate that (unlike in the linear case) the Fisher information matrix for 
nonlinear models depends on the parameters whose estimation is the purpose of the 
experiments we are to design. Hence defining the optimal designs as the optimizers of 
the M(£, 9) is meaningless. Nevertheless, if the experimenter can guess reasonable 
values of the parameters, it can be useful to design the experiment that would 
be optimal if the guessed parameters were correct. Some more advanced design 
methods, such as sequential designs [16] also build on the same concept, often called 
locally optimal designs. (The same ideas can also be used for the estimation of 
nonlinear functions of the parameters of a linear model.) Before considering the 
general case, let us look at a simple example that we shall readily generalize below. 

Example 21. Consider the three-parameter E max model 





y{t) = o + 



0it 



+ N(0,1) 



(12) 



t + 9 2 



from the dose-finding study [9]. With the notation of (11) 
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so for every fixed value (9q, 0*, 9%) of # the integrand in the Fisher information matrix 
(11) can be written as 

/ 1 

t 

t+6* 
__0p 

\ WW 

which is the same information matrix as the information matrix of the parameter 
vector (ao, ai, CK2) for the linear model 

t 9*t 

y(t) =a + aij^; + <* 2 + * + N(0, 1). (14) 

Hence, finding locally optimal designs for the E max model (12) is equivalent to 
finding optimal designs for the linear model (14), which is a linear model with 
rational functions, hence Theorem 9 is applicable. 

A further simplification is possible: we can find an equivalent polynomial model, 
and use Theorem 20 to find optimal designs. It is easy to verify that the matrix 
(13) can also be written as 

/ 1 XX 2 
K T \ X X 2 X 3 

\x 2 x 3 x 4 

/1 1 \ 

with x — (t + #2) 1 an d K — I ~ e 2 ~ * . Hence, for every fixed 9* the Fisher 

\ S l 6 2 J 

information matrix of the design £ for model (12) is identical to the Fisher informa- 
tion matrix of the design that puts £(£j) mass to the point x% = (t% + f° r ^ ne 
three-parameter linear model 

V(X) = "0 + aix + a 2X 2 + N(0, 1) (15) 

and the parameter vector K T a = (ao, «o — ai^i tt2^i^2 — ai#i) T - Now the problem 
is reduced to polynomial regression, and Theorem 20 is applicable. 

Generally, for a nonlinear regression model (10) with m parameters, the problem 
of finding a locally optimal design for a given parameter vector 9* is equivalent 
to finding the optimal design for the associated linear model of the form (1) with 
fi = (df)/(d9i)\ e=et , % — 1, . . . ,m. If / is a rational function of (t,9), then so are 
its partial derivatives. Hence the equivalent linear model (for every fixed value of 
9) is always one with rational functions fi. 

The same observation was used in [10] to derive E-optimal designs for the class 
of nonlinear regression models where the partial derivatives form a weak Chebyshev 
system. Now this assumption can be dropped, and other optimality criteria can also 
be considered. 



t+e: 



WW 

(t+0?)3 

WW 



) 



(13) 
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8. Optimal designs in other functional spaces 

Most of Section 3 applies to every and u, not only to rational functions; for 
example, (22) is not specific to polynomials or rational functions. As long as the set 
of constraints (23) can be expressed by finitely many semidefinite constraints (or in 
any other computationally tractable manner), the same approach works. Examples 
include the following (we consider only the homoscedastic case for simplicity): 

(1) fi(t) = cos(it) for every j e N and t; 

(2) f2i(t) = cos(it), f2i+i(t) = sin(zt) for every % 6 N, and t] 

(3) fi(t) = exp(zt) for every i e N and t. 

These three examples, however, do not truly generalize the approach of Section 3, 
since they can also be reduced to the polynomial case by an appropriate change of 
variables. (We omit the details.) 

Our estimate on the number of support points is also valid for some functional 
spaces other than polynomials. The only property of polynomials that we used were 
that their degree bounds the number of their roots (counted with multiplicity: roots 
in the interior of the domain have multiplicity two). Hence, bounds similar to the 
one in Corollary 18 can be obtained for models where the functions {oofifj\i,j} form 
a Chebyshev system. 

9. Discussion 

Computing optimal designs for linear models involving rational functions is easy 
when the design space is finite, hence the key difficulty in obtaining optimal designs 
for infinite design spaces, such as intervals or unions of intervals, is that the finite 
support of an optimal design has to be determined. Symbolic or closed form solutions 
are unavailable for most models, and their scope is often limited by assumptions that 
are neither technical, nor have any statistical interpretation. In this paper, we have 
presented a method that does not rely on such assumptions. It is an effective method 
to determine a polynomial whose zeros contain the support of the optimal designs. 
The method is applicable to every linear regression problem involving only rational 
functions; it treats D-, A-, E-, and general <3> p optimal designs in a unified manner, 
and generalizes to the heteroscedastic case if the variance of the noise is a positive 
rational function. The design space can be an interval or the union of finitely many 
intervals. 

This level of generality is far greater than what appears to be possible by closed- 
form approaches. It is achieved at the price of providing numerical, rather than 
symbolic, solutions: the method generates the (numerical) coefficients of the sought 
polynomial. The main step of the method is the solution of a semidefinite program- 
ming problem, which can be done (to high precision) with readily available soft- 
ware in trivial running time. Unlike other iterative methods previously proposed 
in the literature, including all of those based on coordinate descent, semidefinite 
programming algorithms have a theoretically guaranteed low running time, and are 
guaranteed to find the globally optimal design, rather than a local optimum. This 
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is of considerable practical importance, with the potential for instance to impact 
design software development. 

Further study of the optimality conditions of the main optimization model might 
also yield new theoretical insights. 

Through a number of examples we have demonstrated the flexibility of the pro- 
posed method, and we also found that the algorithm is robust enough to handle 
ill-conditioned problems involving high-degree polynomials, and yields solutions in 
a fraction of a second for problems of practical size. 

A corollary of our main theorem is a bound on the size of the support set, and an 
analogous optimization model for the estimation of parameter subsystems. 

Most results of this paper readily generalize to linear models involving certain 
exponential families rather than rational functions; these include Fourier regression, 
where the model is a trigonometric polynomial with unknown coefficients. The 
method may also be used to find locally optimal designs for nonlinear models. In 
this area almost no symbolic solutions are available, but model-specific numerical 
methods are abound. Details are available from the author, and may be subject of 
a future paper. 

A few important questions remain open. The first one is how to extend the 
results of Section 5. Since the optimal solution to the problem (5) is sensitive to 
both the representation of the optimality criterion $ and also to the basis 
of the space of regression functions (meaning that equivalent representations of 
$ and basis transformations lead to different optimal solutions), one may readily 
conjecture that for every model (1) and for every admissible optimality criterion one 
can find an equivalent model (that is, a basis of the same functional space) and 
a semidefinite representation (3) for $ such that the optimal is in every solution of 
(5) is nonzero. 

Another subject of future research may be the generalization of our results to 
larger classes of functions. Chebyshev systems are natural candidates to look at, 
but more importantly, the ideas of the paper would generalize word by word to 
every family (/ 1; . . . , f m ) and weight function u for which functions in the space 
span{ufifj\i,j} are easy to maximize. Hence, identifying such spaces of functions 
would be particularly important. 

Finally, the ability to design experiments in a discontinuous design space is ex- 
tremely relevant in practice, especially in the multivariate case (e.g., when measure- 
ments cannot be taken at inaccessible locations, or are practically impossible very 
close to signal sources). Existing models with closed-form solutions are not applica- 
ble, and most of the current numerical methods cannot address this problem even 
in the univariate case, aside from sporadic results involving two disjoint intervals 
for a few concrete models. 

The applicability of the proposed method in the multivariate setting also requires 
further study. 
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Appendix A. The semidefinite representability of polynomials over 



For aACi let T>£ denote the set of de gree n polynomials nonnegative over A. 
The following representation of nonnegative polynomials is well-known: 

Proposition 22 ([28]). For every polynomial p of degree n, p e Vn' b ^ if and only if 



for some polynomials r and s of degree k and q of degree k — 1 . 

On the other hand, functions expressible as sums of squares of functions from a 
given finite dimensional functional space (such as polynomials of a fixed degree) are 
semidefinite representable; see [30] for a constructive proof of this claim. Applying 
this construction to part (2) of Proposition 22 yields the following. 
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Proposition 23 ([30]). Suppose p is a polynomial of degree n = 2m + 1, p(t) = 
^}Hk=oPkt k ' , and let a < b are real numbers. Then p £ Vn^ if and only if there exist 
positive semidefinite matrices X = (xij)™j =0 and Y = (yij)ij=o satisfying 

Pk= ^2 (~ ax v + b vv) + ( Xj >i ~ ( 16 ) 

i+j=k i+j=k—l 

for all k = 0, . . . , 2m + 1. 

Similarly, if p is a polynomial of degree n = 2m, then p £ V n if and only if 
there exist positive semidefinite matrices X = (xjj)™ =0 and Y = (yij)™^ satisfying 

Pk= ^2 ( Xi i ~ abyi ^ + ( a + b ^ ~ Vv ( 17 ) 

i+j=k i+j=k~l i+j=k—2 

for all k = 0, . . . , 2m. 

This is rather involved (and the details are only important for the purposes of 
actual computations), but close inspection reveals that this proposition characterizes 
linear image of the Cartesian product of two semidefinite cones, thus, it 
proves the semidefinite representability of Vn in the sense of Definition 1. 

Since the intersection of semidefinite representable sets are also semidefinite rep- 
resentable, it follows that is semidefinite representable for every union of finitely 
many closed intervals X. 



Appendix B. Proof of Theorem 9 

Consider the problem of finding max{$(M(£)) | £ £ S(Z)}, where S(X) is the set 
of probability measures on X with finite support, and M is the Fisher information 
matrix defined by (2). Considering the Fisher information as the variable, this can 
be expressed as a finite dimensional optimization problem: 

max{$(M) | M £ M}, where M = {M(f) | ££ H(X)}. (18) 

Let £t be the probability measure that assigns all of its mass to t £ X. Because X is 
assumed to be compact and the mapping t — > M(£ t ) is continuous, {M(£ 4 ) 1 1 £ X} 
is compact. Hence, M = conv{M(£ t ) 1 1 £ X} is a convex compact set, and the 
optimization problem (18) is well-defined: The maximum is finite, and is attained 
(for every continuous function $). 

Now let us assume that $ is semidefinite representable. Then using the notations 
of Definition 3, problem (18) may be written as follows. 

maximize z 

subject to Ai(M) + B^z + Ci(u) + A ^ 0, i = l,...,p, 

where Ai,Bi,Ci, and Di are the functions and matrices as in Definition 3. 

Because is a closed convex cone, (19) is equivalent to the following Lagrangian 
relaxation (in which the dual variable Wi is the Lagrange multiplier associated with 
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the ith constraint): 

v 

max inf z + V<Wi, MM) + B i( z + CAu) + A) (20) 

(i=l,...,p) 1=1 

Suppose that $ is admissible with respect to 5W . Then the optimization problem 
(19) has a Slater point, consequently its optimum is equal to optimum of its dual 
problem [35, Chapter 4], obtained by replacing the "max inf" by "min sup" in the 
Lagrangian (20). This dual problem then can be simplified as follows [C* denotes 
the dual operator of Ci): 



p 



min sup z + Y^(Wi, A^M) + B { z + Ci{u) + A) 
w u ...,w p ^o z>u>MeM j~f 



i=l 



= min sup z\l + S^(W h Bi) 
w u ...,w P >o z , u ,MeM \ j^C 

p p 

+ £<C?(Wi), u) + J^iWi, MM) + Di) 

i=l i=l 
P 

= min sup ^2(W U MM) + Di) (21) 

E i (Wi,B i >=-i i=1 

p 

= min max Y^Wi, A(M) + A)- 

Wx,...,W p )pQ Men* 

(The last equation simply means that the supremum is attained.) 

Finally, with the help of a dummy variable y the optimization problem in the last 
line can be conveniently written as: 

minimize y 

Wi,..., w p & k 

subject to Wi )p i = 1, . . . ,p, 

£(Wi,B 4 ) = -l, ^C*(W,) = 0, ( 22 ) 

i=l i=l 
P 

y > ^2(Wi, MM) + A) VMeaf. 

i=l 

Aside from the last set of constraints, which is an uncountably infinite collection 
of linear inequalities, every constraint is either a linear equality or a linear matrix 
inequality on the variables Wi. Using that M = conv{M(£ t ) 1 1 e X}, the last set of 
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constraints can also be simplified to 

p 

y-^iW^MMfoV + Di) >0 Vtel (23) 

i=l 

Since M(£t) is a matrix whose entries are rational functions of t, this inequality 
expresses the nonnegativity of a rational function (over X) that lives in the space 

V = span ({w/Jj \ i,j = 1, . . .,m} U {1}) , 

with variable coefficients. Multiplying both sides with the least common denomi- 
nator of the functions oofifj (which is positive on X) turns (23) to the equivalent 
inequality (5d) with II defined in (6), giving us (5). 

Finally, suppose (y, n, W±, . . . , W p ) is an optimal solution to (5). Then, since (5), 
(21), and (22) are equivalent, (y, W±, . . . , W p ) is also an optimal solution to (22), and 
because the optimum in (21) is attained, there exists an M G M that satisfies the 
last constraint of (22) with inequality. The way we obtained (22) from (18) ensures 
that this M is also an optimal solution to our original problem (18). Suppose 
M = M(£) for some measure £ G S(X) that is concentrated on . . . , t^} C X and 
assigns weight Aj to ti, i — 1, . . . , k. Then with the optimal y and Wi, . . . W p each 
of these tj must satisfy (23) with equality. Consequently each ti is a root of n. □ 
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